Handbook of Research on Big Data and the IoT by Kaur Gurjit

Handbook of Research on Big Data and the IoT by Kaur Gurjit

Author:Kaur Gurjit
Language: eng
Format: epub
Publisher: Engineering Science Reference


4.1.3 Data Storage

HDFS (Hadoop distributed file system), S3 (Simple storage services)

Servers: EC2, Google App Engine, Elastic, Beanstalk, Heroku

4.1.4 Data Processing

R, Yahoo! Pipes, Mechanical Turk, Solr/Lucene, ElasticSearch, Datameer, BigSheets, Tinkerpop

We now examine two of the most popular Big Data processing frameworks, MapReduce and Hadoop, in detail.

4.2. MapReduce

It is a data processing computational framework applied to large datasets by employing distributed algorithms on clusters. This framework comprises user-defined Map and Reduce functions as well as a MapReduce library. Data is processed in parallel using map functions, whose output is sorted and processed by reducing functions. The MapReduce library parallelizes the data processing by breaking it down into smaller chunks that are processed using a master/slave implementation. Typically, the MapReduce framework is implemented in six steps as follows.

Step 1: Read data value from the Hadoop Distributed File Systems (HDFS).

Step 2: Split the task into small tasks.

Step 3: Input key/value pairs to Map function to generate intermediate key/value pairs.

Step 4: From the output of the Map function, identify and send all pairs with the same key to the Reduce function.

Step 5: Sort the input to the reduce function by key.

Step 6: Write the reduced output into the HDFS.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.